Exponential Concentration Inequalities for Additive 
Functionals of Markov Chains 



Radosiaw Adamczak*and Witold Bednorz^ 
University of Warsaw 

January 18, 2012 



Abstract 

Using the renewal approach we prove exponential inequalities for additive functionals and em- 
pirical processes of ergodic Markov chains, thus obtaining counterparts of inequalities for sums of 
independent random variables. The inequalities do not require functions of the chain to be bounded 
and moreover have all the constants accessible whenever the usual drift condition holds, which is 
crucial for practical applications e.g. in MCMC algorithms. 
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1 Introduction 

This paper concerns exponential type concentration inequalities for additive functionals of Markov 
chains, i.e. for sums of the form 

f(X ) + . . . + /(AVa), 

where (Aj)j G ^ is a Markov chain. 

Such inequalities are widely used in Markov Chain Monte Carlo theory to provide estimates on 
the rate of convergence for certain algorithms. Moreover, in certain statistical applications (e.g. for 
M-estimators) one is interested in estimates on suprema of such functionals over some classes T of 
functions. 

Concentration phenomenon in the Markov chain setting has been studied in many papers to 
mention [TS1 EH HH [TBI El EU El E H] • Clearly in general one cannot hope to recover classical results 
for sums of independent random variables at their full strength. Therefore the goal is to provide 
counterparts of the inequalities for independent summands under conditions, which are relatively easy 
to verify and involve only 'computable' characteristics of the chain. From the practical point of view, 
it is also important to derive estimates with explicit and 'reasonable' constants. 

Among the most successful approaches developed to obtain deviation inequalities for Markov chains 
one can list the transportation of measure method (see |15t I21|). martingale approximation |12) and 
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renewal theory based on the splitting technique introduced by Nummelin [17] and Athreya and Ney 
[3] (for general introduction see [TBI US E] ) • 

In this paper we follow the renewal approach, i.e. we decompose the sum of functionals of a Markov 
chain into excursions from and to an existing or artificially created atom. This concept allows to 
reduce the question of exponential inequalities for Markov chains to the concentration of sums of 
independent (or nearly independent) random variables at the expense of some further technical work. 
Moreover, under additional assumptions, it allows to incorporate in the estimates the limiting variance 
of a rescaled functional and thus to obtain Bernstein type inequalities corresponding to the central 
limit theorem. For reader's convenience and further use we recall below the classical formulation for 
independent summands 

Bernstein's inequality Let (£i)°^ ^ e independent random variables such that E£j = and < M. 

Let = a 2 , then for any t > 0, 

P(lg?,l> t )<2exp(- 2(w2 + Mt/3) ). 



If one insists on having an estimate in terms of the variance, Bernstein's inequality requires that all 
random variables be bounded, which is rather restrictive. Clearly one expects similar results to hold 
under more general assumptions about integrability of the summands, e.g. when Eexp(|£j| a /c a ) < 2 
for some a > (which corresponds to finiteness of exponential Orlicz norms). This is indeed the case, 
though the inequality is a little bit weaker. In a more general context of empirical processes this was 
proved in pQ. 

The approach of [1] relied in a strong way on difficult inequalities due to Talagrand [22\ UOj . 
obtained via isoperimetric approach and in consequence did not provide estimates of the constants 
involved in the estimates. As already mentioned, having accessible constants is crucial for practical 
applications, therefore we will partly reprove the result from pQ using a more direct method. In the 
real-valued case we will not need Talagrand's results, whereas for suprema of additive functionals we 
will use Talagrand's inequality only in the version for bounded functions, for which good constants 
are known |llj . 

In particular, we will show (Theorem [3D that for a stopping time N < n and 1-dependent variables 
(C)iSo that verify the assumptions mentioned above for some a 6 (0, 1], we have 

k t a _ t 2 

P( sup I >t)< 2e 8 e 2 < 4c > Q +4exp( -— — ), 

. -< 1 ; ~ PV 32([n/(2m)]a 2 + Mt/6) J ' 



k<n/m 



l- 



where m is a positive integer (which in subsequent applications will correspond to one of the parameters 
of the Markov chain) and M = c(3a~ 2 log(n/m)) 1//a . 

Following the general regeneration approach to Markov chains, one can apply the above inequality 
to excursions from and to an existing or artificially constructed atom of a chain. For all ergodic 
Markov chains the split chain construction |18] guarantees that it is possible to enlarge the space the 
Markov chain lives in so that a true atom exists for some of the chain skeletons. However implementing 
this strategy requires some additional work. In particular one needs effective criteria to prove that 
Eexp(|£j| a /c a ) < 2 with c accessible. 
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Consequently in the simplest case we provide bounds on P x (| Ylk=o f(^k)\ > m t ne form close 
to the one of Bernstein's inequality, with constants that depend on the drift condition only, and which 
work for functions that are controlled from above by the drift function. The general result will be 
stated as Theorem [5j It asserts that for t > 0, 

P*(| ]T f(Xi)\ > 3t) <2exp(-^) + 2vr*(a)- 1 exp(-^) + 2e 8 exp(--|— ) 
i=o ^ ' 



1K 32(\n/(2m)]a 2 + Mt/6) ; ' 

with M = c(3a~ 2 log(n/m)) 1//a , where m,a, b, c are certain parameters of the chain and a 2 = Esq is 
the variance of the block appearing in the regeneration construction. 

We also provide tools to efficiently estimate the parameters involved in the above inequality by 
means of drift conditions, which in the Markov chain setting are the most convenient and widely used 
technique for proving exponential integrability. Therefore, to provide an accessible tool, we analyze 
how to deduce bounds on a, b, c from different types of drift criteria. In general it is known [13] 
that for a = 1 one cannot avoid geometric drifts that are usually difficult to work with. To avoid 
this obstacle we show that a usual drift condition also provides a bound (however with a < 1). The 
simplest setting we describe is that of geometric ergodicity where we show a class of functions that 
depends on a and the drift function only for which the above ideas work and for which one can obtain 
strong concentration results. Again we pay attention to provide reasonable constants in the estimates 
which allow for practical applications. 

We also consider refined versions of the above inequality and show that under additional as- 
sumptions (strong aperiodicity and geometric ergodicity) one can obtain inequalities in which the 
subgaussian coefficient (responsible for the tail behavior for 'small' values of t) can be arbitrarily close 
to the asymptotic variance appearing in the central limit theorem. We also prove similar inequalities 
for suprema of empirical processes, again recovering the right asymptotic weak variance (which in the 
Donsker case corresponds to integrability properties of the limiting Gaussian process). 

The organization of the paper is as follows: in Section [5] we discuss the Markov chain theory and 
variety of integrability conditions we need to prove exponential concentration and in Section [3] we 
state their characterizations in terms of drift conditions; in Section H] we prove the main tool which 
is the exponential inequality for almost independent random variables; finally in Section [5] previous 
results are used to prove exponential concentration inequalities for Markov chains. 



2 The exponentially fast decaying tails of Markov Chain excursions 
2.1 Notation and preliminaries 

Let X = {Xh : k E Z + } be a time-homogeneous Harris ergodic Markov chain defined on the state 
space (X, B) (to avoid certain measurability issues we will assume that B is countably generated, which 
is enough for all the applications we have in mind) and let X m denote its m-skeleton. We will denote 
by (fi, J 7 , P) the general probability space on which the process is defined and by P M its conditional 
version where the starting distribution equals /i, i.e P M (A 7 'o £ A) = fi(A), A £ B. For simplicity we 
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write P z whenever [i = 5 X . Let P(x,A), x G X , A G B denote the Markov chain transition function 
and let P be the operator on the measurable functions given by (Pf)(x) = E x f(Xi) = J f(y)P(x, dy). 
Since we are interested in exponential inequalities let us also recall the definition of exponential Orlicz 
norms, 

||Y||^ = inf{c>0: E„ exp(^f) < 2}. 

Note that the subscript above indicates the measure with respect to which the Orlicz norm is 
taken. 

Let us now briefly recall the regeneration method. We outline only the main points needed for our 
further applications and refer to |18[ [To] for an extensive exposition. 

The general theory of Markov chains states that whenever the chain is aperiodic and there exists 
an invariant probability measure it on (X,B) then the small set condition is verified, i.e. there exists 
C G B of positive 7r-measure, a probability measure v, v{C) > 0, 5 > and an integer m, such that 

P m (x, B) > Su(B), xeC, BeB. (l) 

Moreover m-step chain {X km ■ k G Z + } may be split to form a chain that possesses a recurrent atom. 
The construction is well known and as references we recommend |18[ \W[ [B] . 

The split chain construction is based on introducing variables Y k G {0, 1} that denote the level of the 
split m-skeleton at time km. Let X" 1 = {X™ : k G Z + } = {(-Xfc m ,lfc) : k G Z + }. The new chain is 
defined by conditional probabilities 

P(Yfc = l,X km+1 G dx\, •••,X( fc+1 ) m _ 1 G dx m -i,X( k+ 
= P(Y k = 1 , X\ G dxi, ...,X m _i G dx m -i,X m G dy\X = x) = 

where = cr({Xi)i< km ) and J 7 %_ 1 = o"((Y)j</ c _ 1 ). The above condition means simply that we 
arrange the conditional distribution of the intermediate parts of the chain so that they fit to the split 
m-skeleton. Note that 

P{Y k = l,X {k+1)m G dylTkm^k-x^km = x) = l c (x)Su(dy), 

and consequently P(Y k = \\T^ m ,Tl_ x ,X km = x) = 51 c {x) and P(X (fc+1)m G dy\T^ m ,T^, ,Y k = 1) = 
v{dy). Therefore whenever X km enters C, with probability 8 we decide on Y k = 1, and if so distribute 
X(k+i)m according to the measure v. 

One also checks easily that Xi is a Markov chain with transition function P. 

For each initial measure [i we denote by fx* the measure on X x {0, 1} such that fi*(A x {0}) = 
fi(AnC)(l - 5) + fi(AnC c ) and fx*(Ax {1}) = fj,(AnC)5. We continue convention P^* = P^*, where 
5 X stands for the Dirac mass at x. The clear consequence of the construction is that 

{Xi,Yj : i < km,j < k} is independent of {Xi,Yj : i > (k + l)m,j > k + 1} 

under the condition that Y k = 1. Moreover {Xi,Yj : i > (k+l)m,j > A;+l} has the same distribution 
as P u * process {Xi,Yj, i,j > 0}, where 

Pv(Y = 1, Xq g dx) = 8l c {x)v(dx). 



4 



Therefore we can treat a = C x {1} like an atom of the chain X m . Our approach to deviation 
inequalities will be based on the decomposition of the sum Y^l=o f(^i) into almost independent 
excursions between consecutive return times to a. 

Let a = <t(0) = min{A; > : = 1} and 

a(i) = mm{k > a(i -1) : Y k = 1}, k > 
and in the same way r = r(l) = min{&; > 1 : Y k = 1} and 

r(i) = mm{k > r(i - 1) : Y k = 1}, k > 1. 

For each i we define 

m<r(i+l)+»n— 1 a(i+l) 

*(/) = £ ./s.V,i £ 

3=m(<r(t)+l) j=o-(i)+l 

where Zj(f) = Y^k=o fi-^jm+k)- The main result on the excursions is the following ([16j Theorem 
17.3.1]) ' 

Theorem 1. T/ie too collections of random variables 

{*(/) : 0<i<k-2}, { Si (f) : i>k} 

are independent for any k > 2. The distribution of is /or any i egttaZ to the P a distribution 

°f X]I=m ^ f(Xk) which is equal to P y * distribution of^f™^™ ^ fi-^-k)- Moreover the common 



mean 



°f s i(f) ma V be expressed as Esj(/) = 5 1 vr(C) 1 m J /oV. 



Therefore if vr(/) = then also Esj(/) = 0. We use the above result to decompose the path into 
three parts 

n— 1 min(m<T+(m— l),n— 1) TV mcr(N)+(m—X) 

\j2f(X k )\<\ Yl /TO| + |E^-i(/)| + |liv>o £ /(**)|= ( 2 ) 

A;=0 fc=0 i=l fc=n 



=:^«(/) + ^n(/) + W n (/), 



and 



iV = inf{i > : ma(i) + (m - 1) > n - 1}. (3) 



Note that TV is a stopping time with respect to the filtration T% = ct(sq, . . . , Sj_i, cr(0), er(l), . . . , cr(i)). 

In order to prove exponential inequalities we need to provide appropriate integrability conditions 
for all the summand above. They will be expressed in terms of the following exponential norms: 

a=||£|Zi(/)IIU ,P.. <oo; 

i=0 
a 

b = \\^2\ z t(f)\h a ,p„« < °°; 

c = ||b«(/)|L„.p < oo. (4) 
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The above quantities are rather troublesome to use (as expressed in terms of the split chain on the 
enlarged probability space) , therefore we prefer their counterparts expressed in terms of the original 
chain, i.e. without referring to the auxiliary variables Y{. They are 

TC — 1 TC— 1 

a = \\J2 IW)llk,p.; ® = W E l^(/)Hk,p.; 

i=0 i=0 

TC— 1 

c = sup || V ^(/)Hva,p*; p = su p ll^b(/)llv ai p ai , 

where re = tc(1) = inf{k > 1 : G C}. For later use define also rc(i) = inf{/c > Tc(i — 1) : Xk m £ 
C} for i > 1. 

Note that for m = 1, V = sup xeC |/(x)|. 

Let r > 1 be the unique solution of 

+ 2 1+ -(1 - = 2 (5) 

(recall that 5 is the number appearing in the minorization condition ([1])). 

In particular if S = 1, then r = 1. Moreover by the concavity of x 1 ~r and monotonicity of the left 
hand side above in r G [l,oo) we get r < log(j^)/log(2r£). 

The following proposition provides a comparison between a, b,c and A,B,C,D. 

Proposition 1. In the setting described above with a G (0,1] the following inequalities hold: 

a< r^((max{„4,C}) Q +V a )^; 
b < r^((max{^,C}) a 
c < r«(C a +V a )«. 

Proof. Recall that distributions of si{f) are the same for all i and equal to v* distribution of Ylk=o ^k(f)- 
Let a G C be an arbitrary point. We will write E a j (i = 0, 1) to denote the conditional expectation 
(on the enlarged probability space) given Xq = cl,Yq = i. In particular by the construction of the 
split process the distribution of (X m , X m+ i, . . .) under E a .i is independent of a G C (and equal to the 
E u * distribution of (Xq,X\, ■■■))■ Thus by standard conditioning arguments, the following inequalities 
hold for any c > 

Eexp(c- a | S ,(/)r) < 

TC — 1 OO TC — 1 

<E ail exp(0 Y, ^(/in^Kl-^upE^exp^l £ Z 3 (f)\ a )] fc " 1 sup SE X>1 exp( C - Q |Z (/)r ) 
j=l k=i x€C j=o x€C 

Therefore (for 5 sufficiently small) 

Eexp(c- a | Sj (/)r)< 

SB aA exp(c-«| Y,?=i l HfT) ^P.ec E,,i exp( C - a |Z (/)| Q ) 



< 



1 _ (1 _ 5) suP;EeC E Xi oexp(c-| EJSo 1 Hf)\ a 
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Set c = ra (C a + V a ) a . Let p 1 = C c?+ T>a and q 1 = c J^p a ■ Recall that r > 1. By Holder's inequality 

TC-l t c -i 

supE^oexp^l V Z,-(/)| a ) < (supE :Ei oexp(p- Q |Zo(/)r))-( S upE Xi oexp(C- Q | V Zj(f)\ a ))^, 
a=ec J=0 zee xeC - =1 

su P E X) iexp( C - a |Z (/)| a ) < (supE X) iexp(Xr a |Z (/)| a ))£ 

xec xec 



and 



Define 



and 



E ail exp(c- a | £ Z iUT) < (E ail exp(C- Q | £ ^(/)| a ))-. 
i=i j=i 

ro-l ro-1 

X = supE :Ci oexp(C- Q | J] ^(/)r), F = E ail exp(C- a | ^ ^(/)n 

V = supE :C) oexp(p- Q |Zo(/)| a ), = supE^i exp(p- a |Z (/)| a ). 

xec sec 

Using the above inequalities we get 

j_ j_ 

SY p r W i r 

BeMc-^ifT) < — ■ 

1 - (1 - 5)X^V^ 

By the definition of C,V and the split chain construction 



T"C-! 

(1 - S)X + SY = supE«exp(C- a | Zj{f)\ a ) < 2 (6) 



and 



xec j=1 



(1 - SjV < supE x exp(Zr a |Z (/)r) < 2 and SW < sup B x exp(V- a \Z (f)\ a ) < 2. (7) 

xeC xeC 



Thus by the definition of r, 

rl— - - 

Eexp(c- Q | Si (/)r) < - = 2, 

1 — (1 — 0) r 2r 

which proves the required bound on c. 

We will now estimate a and b. In fact we will prove a more general statement. Consider any probability 
measure \i on X and denote M M = || Yli^Q 1 \ z i{f)\\\ip a ,P »> where oc = inf{i > 0: Xj m £ C}. Note 

that < || EStT 1 l^(/)IIIVa,Pp- In Particular M w < B and M 5x < .A. Thus to end the proof it is 
enough to show that 

llX>*(/)llk,fV <r 1 ' a {{max{M ll ,C})° + V°) 1 ' a . (8) 

i=0 
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Arguing as above we obtain that for any c > 0, 

v . UMa , . ^exp(c-"(Erf(T 1 |^(/)|) a )snp, gC E,, 1 exp(c-°|Z (/)|^ 
E M *exp(c |s (/)r)< ; - - , L,^r C -i 



1 _ (1 _ 8) S up yeC B yfi eMc- a (E?=o \Z 3 {f)\) a ) 

Set c = r«((max{M M) C}) a +X> Q )l Let p" 1 = (^g^fi^. and q' 1 = (max{ ^ })a+pi » . Applying 
the notation of A, V, W and Holder's inequality we get 

E Ai *exp(c |s (/)| ) < - 1 1 

1 - (l-S)X^Vv 

and thus by Q and (J7|) we deduce that 

i_ J_ J_ J_ 1 i i 



E M *exp(c- a | So (/)r) < ; — — < 



1 - (1 - 5) 1 ~-2- 1 - (1 - 5)^-2- 
by the definition of r. This ends the proof of (JSj) . □ 



3 Drift conditions 

Our next goal is to provide conditions guaranteeing that A,B,C,T> are finite, which via Proposition [1] 
and inequalities for independent summands will allow us to control the quantities U n (f), V n (f), W n (f) 
in ©. 

One of the standard tools that has proved useful in the analysis of integrability properties for the 
excursions of Markov chains is drift conditions. 

Below we consider two types of drift criteria. The first one is the multiplicative drift condition 
introduced in [TU [13] to deal with pure exponential integrability (i.e. with a = 1). It is known [13] that 
in the case of m = 1 this condition is equivalent to exponential integrability (finiteness of ipi norms) of 
Y^Zq 1 f(Xi)- Below we analyze the drift condition expressed for the m-skeleton and properly modified 
function, obtaining sufficient conditions for the finiteness of the parameters A, B,C,T>. We also show 
that the multiplicative drift condition is in a sense a minimal requirement for proving exponential 
integrability of the excursion, in particular obtaining good constants in the equivalence proved in |13j . 

The multiplicative drift condition, although important from theoretical point of view, is of limited 
use in applications, as it is difficult to verify when compared to classical drift criteria used to obtain 
integrability of the regeneration time. Therefore we subsequently analyze what integrability properties 
of the excursion can be obtained with such simplified drift conditions. We show that by assuming 
integrability conditions on the regeneration time and a simple drift condition on the function /, one 
can still obtain ip a integrability, however with a < 1. This still allows to obtain meaningful exponential 
inequalities, which for moderate values of t agree with the classical Bernstein's bound. 

3.1 Multiplicative geometric drift condition 

Observe that for any initial measure v and a £ (0, 1], 

ii £ z kU)h*,»< ii E i^(/)nl,- ( 9 ) 

fc=0 fc=0 
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In particular, in order to bound a, b, c it suffices to control the usual exponential tpi norms of 
Yl'h^o 1 \Zk(f)\ a - Let thus denote g(x) = logE x exp(2c _1 |Zo(/)| a ) and assume that for some c it 
verifies the multiplicative geometric drift condition from |14| I13j . i.e. that there exists a function 
V : X — > M + and constants b > 0, K > such that 

exp(-y(x))P m (exp(y))(x) < exp(-g(x) + bl c (x)), (10) 

and V(x) < K for x G C. 

The drawback of the drift condition (|10p in comparison with the usual drift criteria is that in 
practice its direct verification is difficult. At the same time it turns out that it is in fact equivalent to 
the existence of c < oo such that 

re — 1 tq 1 

supE a; exp( ffff)) = : d < oo and V xeX || ^ g(X^)\\^ 1<Px < oo. (11) 



Note that (jTTj) implies 



TO— 1 TQ—l 

sup || l^(/)rik,Px <ooandV a:6 ^|| £ g(X^)\\^ 1>Px < <x> (12) 



x€C k=0 k=0 

due to the Schwarz inequality 

E.exp^ 1 Y, \Z k (f)n ^VxeMc^YlZkifT) (13) 

fc=0 fc=0 

< [E x exp(f; 5 (Xr))]^E,exp(f;(2 C - 1 |Z fc (/)| - g(X?)))]* 

fc=0 fc=0 

= [E x exp(f>(AT# < [E,exp(f>(AT))supexp( 5 (y))]i 
fc=0 fc=0 yeC 

< [V x eMj2 3W))su P E,exp(£ ^D)] 1/2 - 

fc=0 2/60 fc=0 

Moreover, if m = 1 then (jlip and (|12p are equivalent (note that in this case g(Xh) = 2c~ 1 \f(Xj : )\ a 
and so we do not need the additional argument above). 

We will now prove that the drift condition for g is of the same power as (jlip . More precisely, we 
have the following 

Theorem 2. Conditions U0\) and Ul\) are equivalent in the sense that 
1. Whenever jlOj) holds then so does JT7]j. Moreover for every x € X, 

t c —i 

E x eM 9(XD) < exp(61 c (x) + V(x)) 

k=0 
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and consequently 



tc-1 b + K 

sup || V \Z k (f)\ a \\ lPliFx < max{l, — — -}c. 
xec ^ log 2 



2. Whenever holds, then the function g satisfies MU\) with V{x) = log(G ! c(x, g)), where 



<J<: 



G c (x,g)=-E x eMj29(Xk)), 



k=0 

and 6 = 2 log d, K = log d. 

Proof. Suppose that (fTOj) holds. Let = o-(X™, X™, X™), k = 0, 1, 2, and define the exponen- 
tial martingale 

Mfc - E(exp(y(X r )|^ 1 ))-E(e X p(y(X r ))|^ ) eXp( ^ )} " (M) 
Therefore for the stopping time tc An we have 

E x M TC An = exp(V(x)), for x £ X. (15) 

Due to (fTU|) we obtain that 

exp(y(X^ 1 )) exp(F(^ 1 )) 
E^eMViXY 1 ))^) P(exp(F))(X- i ; 

and hence 



Consequently by (|Mj) and (|15|) for every x G Af, 

(rcAn)-l 

E x exp( 9(Xn)<eMblc(x) + V(x)), 

k=0 

therefore by letting n — > oo and using Fatou's lemma we get 

E a exp( ^ 5 (X fc m )) < exp(61 c (x) + V(x)). 

k=0 

Now, by d 

TC — 1 

E^exptc" 1 £ |Z fc (/)P) <exp((61 c (x) + F(X) + 6 + ^)/2), 

k=l 

which via Holder's inequality implies the second inequality of point 1. 
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To prove the second assertion observe that 
(P m G c )(z) = E x exp(f>(Xn) 

k=l 

= exp(-< 7 (x))E :E exp(^ 5 (Xr))exp(l c (x)log(E :E exp(^< 7 (Xr)))) 

k=0 k=l 

TC 

= exp(-</(x) + lcr(x) log^ exp(^ g(Xj?)))G c (x, g), 



k=l 

ac T C 



k=l 

i.e. 



eM-V(x))(P m eMV))(x) = exp (-g(x) + lc(x) log(Ea; exp(^ g(X™))) 



k=l 



Prom the definition of d in (jlip we conclude that g(x) < logd for x £ C and 

T c 

supE a; exp(y^5r(X fe )) < d 2 . 



fc=l 



Therefore 



and 



suplog(E x exp(V'5r(X fc ))) < 21ogd. 



k=l 



sup V(x) = log(Gc(x, g)) = g(x) < log d, for x G C. 

xec 

This shows that (fTT|) holds with 5 = 2 log d and K = log d. □ 
Corollary 1. 7/ rtiOj) is satisfied then for every measure 



/max{26 + T/(x) + JT,21og2} \£ 
I 2loi2 C ' 



„^ /max{26 + 21og7r(exp(y)/2) + #,2 log 2} \£ 

2loi2 V ' 

^{6 + if, log 2} ^ 
' ~ V log2 / 

Proof. It is enough to combine the estimate C a ,V a < sup^g^ || Y^k=o l \^k{f)\ a \\ip 1) v x i the first part of 
Theorem [5] (integrated with respect to tt in the case of quantity B) observation (|13|) and the Holder 
inequality. □ 
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3.2 'Regular' drift condition 



As we have already mentioned, the multiplicative drift condition is difficult to check. Therefore we 
would like to replace it with a simpler criterion such as the usual drift condition (see [16], |18|). The 
price to pay is strengthening the requirements on tq- Namely we assume that 



where j3 > a. Note that if (3 = 1 then we are in the setting of geometric ergodicity. We would like to 
point out that one can verify such integrability conditions on tq by using classical drift criteria (see 
e.g. [16J) for /3 = 1 or its modified versions presented e.g. in [8] for j3 < 1. In the latter reference also 
concentration inequalities for bounded / were presented. Our aim now is to provide drift conditions on 
/, simpler then those discussed in the previous section, which would complement the ips integrability 
of tc and yield strong exponential inequalities in the unbounded case. 



Let h{x) = logE a; exp(c-^|Zo(/)| 7 ) , 7 = (note that for m = 1 we have h{x) = c^\f{x)\f). 
The drift condition we will consider is of the form: suppose there exists V : X — > M + , b > such that 



and V(x) < K for x G C. The usual martingale argument (see [16]) shows that for all x G X, 



x&C 



sup HtcII^.Ps < 00, 



{P m V)(x) - V{x) < - exp(/i(x)) + bl c (x) 



(16) 




(17) 



Proposition 2. For a G (0, 1] and (3 > a, i 



if Mb]) is satisfied, then 



TC—l 



SU P II l^k(/)HIV'a,P. 
xeC k=0 



< C1C2 



where c\ = swp xeC HtcII^p^ c 2 = c(max( og 1 l og2 '- , 
Proof. For u, v > holds the Young inequality, i.e. 



l)) l h. 



U * V * < + (1 _ ") V J 




Consequently 



TC—l 



to—I 



E x exp(c- a c 2 - a | Z kUT) <E x exp((cr 1 r C ;) a (r c 1 



J2 c^\z k (f)\r) 



k=0 



k=0 



<E 1 .exp(^)exp((l-^)(r c 1 £ c^\Z k (f)\)^). 
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Therefore by the Holder inequality 



Tc-i 

E.exp^c^l Z k(fW)< 

k=0 

< ( Ejc (ex P (^)))f (E^exp^ 1 £ c^tf)!))^)) 1 "*. 
c i fc=0 



Note that if c\ = sup xeC < [ItcH^p.,., then 

B x exp(-^) < 2, for x G C. 

Now we will estimate the second term. We have 

E^exp^ 1 £ c^lZfcCflin < exp( o< max < E x ( £ e xp(^^))$ . 

fc=o — T ° 2 fc=0 



Since c 2 = c(max( '°f , I)) 1 / 7 , (HZJ) implies that 



log 2 

E/Eexpt^f^t < ( E /f>xp(^f^))S < 2 

A:=0 fc=0 



and so 

ro— 1 



supE^exp^VI £ Z fc (/)| a ) < 2, 
2:60 fc=0 

which completes the proof. □ 

Basically the same idea can be used to bound A and B. We summarize the result in the following 
Theorem 3. Whenever the drift condition H6\) is satisfied the following inequalities hold: 

1. Let c 3 = HtcII^p*, c 4 = c(max( log( ^ 2 )+fe) , 1)) 1/7 , then 

rc — l 

A= \\Y1 \ Z kU)\h a ,-P* < C 3C4 
fc=0 

2. Let c 5 = ||rc||^,p xJ c 6 = (max( log |^ 2 +i,) , 1)) 1/7 , then 

TC-l 

B=|| E \Z k (f)\U a>Pn <c 5C6 , 

k=0 



3. Let ci = sup xeC Urcll^.P^ c 2 = c(max(M, 1)) 1/7 , £/ien 



re— 1 

C,£><sup|| ^ |Zfe(/)Hlv>c,,P^ < cic 2 , 
2:60 fc=0 
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Remark If /3 = 1 then 7 = a/(l — a) and the requirement on tq is equivalent to geometric ergodicity. 
Therefore in the simplest case of m = 1 to obtain meaningful bounds on the A,B,C,T> it is enough 
e.g. to assume the classical drift condition for V : X — > [1, 00), i.e. 

(PV)(x) - V(x) < -XV(x) + bl c (x), 

and 

|/(x)| <c(\ogX + V(x))^r. 

The estimate obtained by Theorem will be then expressed in terms of c and the classical bounds 
on the regeneration time in terms of V and A (see e.g. |16j). We leave the details to the reader. 



4 The main tool. Exponential inequalities in the independent case 

In this section we develop some inequalities for sums of independent (or one dependent) unbounded 
random variables, which we will later combine with the renewal approach to obtain results for Markov 
chains. The first lemma will let us truncate the variables and reduce the problem to well known 
inequalities for bounded summands. Contrary to the approach in pQ at this point we do not need 
Talagrand's inequalities and thus we are able to obtain explicit constants (this happens at the cost of 
weakening the generality of the result in the independent case, but improves the inequalities which are 
then applied to Markov chains). To avoid formal problems below we adopt the convention 'logn := 
log(n V e)'. 

Lemma 1. Let £o)--->£n-l be i.i.d. random variables such that E exp(c _a |^j| Q! ) < 2 for some a £ 
(0,1]. Let M = c(3a -2 log n) 1 /" and Yi = &1\^\>m ■ Then for any < A < 1/ \2 l / a c), 

n-l 

Eexp (V^(|^,| +E|^|) Q ) <exp(8). 
i=o 

Proof. Note that by independence 

n-l 

Eexp (\ a Y,(\ Y i\ a + ( E l^in) = exp (nA a (E|£o|l| €o |>Mr) (Eexp^ori^Af))"- (18) 

i=0 

By Markov's inequality 

P(|£o| > t) < e"^Eexp(^ r ) < 2 e -~ . (19) 

We thus obtain that 

roo roc 

E|&|l fe |>M < / P(|&|1 N>M > t)dt = MP(\^\ >M)+ Pflfcl > t)dt < 

JO JM 

M" f°° t« 

<2Me"^+2/ e'^dt. (20) 
To bound the last term in (|20p we need the following lemma. 
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Lemma 2. If M > 2 1 / a a~«c then 

1'°° t a M a 

I exp( — -)dt< Me~~. 

J M c ° 

Proof. The change of variables s = t a /c a implies that 



e c<*dt = — I s« e oLs. 



Then again by the change of variables we deduce that 



1 



e ^ dt = —e / (— hs)« x e s <is. 

M « 7o 



Using the inequality 1 + x < exp(x) we have 



( h s)- _1 e" s ds = — / (H s)« x e s ds < 



i 



exp (-s 1 - - - 1 — )Ws = — 1 - - - 1)— r 1 . 



Since by assumption M a /c a > 2/a, the right hand side above is bounded by Ma/c and thus 



/■OO jJOl 

I exp( — -)cft<Mexp( 



□ 



Plugging the estimate from Lemma [2] into (|20p we conclude that 

exp(A Q (E|e |l| €o |>M) a ) < exp(A Q [4Me-^n. 
By definition M = c(3a~ 2 logn) 1 /" > c(2a~ 2 logn + a _2 ) 1//a , therefore 

exp(nA a (E|£o|l| £n i>Mr) < exp(3 • A a n\ a c a a - 2 ^^e""" 1 ) < exp(3 • 4 a AV*4— ), 
s n^ a e n 

where we used the fact that sup aG ( ,i) a_2 ex P( — a_1 ) = 4/e 2 . Since e 2 > 7, we see that whenever 
X a c a < 1/2, we have 

exp(nA Q (E|Co|l| Co |> A f) a ) < exp(4). (21) 

Now we proceed to the estimate on Eexp(A Q |£o| Q l|£ |>A/l)- 
There is the following representation 

/oo 
P(I£o|% |>m > A^log-t)^. 
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Let t = exp(A Q M a ), i.e. M = A -1 log" t . Observe that one can split the integral bound into two 



terms 

roo 



rco poo 

/ P(|£o|1|£ |>m > A" 1 log* t))dt = (t - 1)P(|6I > M) + / P(|6| > A" 1 log- t)dt. 

J I J t 

Applying (|19p again we obtain that 

/oo 
p(ieoiii & i>jif >A- i io g -t))dt< 



< 2t e"^ + 2 1 t~ x a °~~dt < 
'to 



1 - \ a c a 



< 2t e~ + 2- — 1 



Since to = exp(A a M a ) we have toe c™ = tj A c and thus we conclude that whenever Ac < 1, 

Eexp(A Q |eori| & |> A /) < 1 + exp(-(l - X a c a )^)- 

Thus requiring that A a c a < 1/2 we guarantee that 

Eexp(A a |e ri|^|>M) < l + 4e~&. 
Since M > c(21ogn)a we deduce 

Eexp(A a |e ri| & |> A f)<l + -. 

ISU| n 

This gives 

(Eexp(A Q |X 1 ri| Xl | >M ))" < (1 + 1)» < exp(4), (22) 

which together with (|18p and (|2ip ends the proof. □ 

Lemma 3. Let £o>---)£n-i be i.i.d. mean zero random variables bounded by M with variance a 1 
and let T < n be a stopping time (with respect to some filtration Ti 5 f(£o> • ■ • >£i-i) such that is 
independent of J~i). Then for every a > 0, e G (0, 1) and 

A < min ( ^ -, ^ ) , (23) 

\2M(1 + e)'(l + e)a VII (T - a)+\\fr > 



we have 



E exp(| A Y |) < 2 1+£ /( 1+£ ) exp f A 2 a(l + g )a 2 
fvi ^stj.i;_ H V2(l-A(l + e)M/3) 
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Proof. Let us consider the martingale 

_ exp(A(l + £)Eti&-i) 
fc (Eexp(A(l + e)£o)) fc ' 

By Doob's theorem EMf = 1 and thus by the Holder inequality 

T T 

E exp(A &-l) = E exp(A £ £j_i)(E exp(A(l + e)£o)r T/(1+e) (E exp(A(l + e )&)) T/(1+e) 

1=1 1=1 

t 6/(1+6) 



< (EM T ) 1 /( 1+£ )(E(Eexp(A(l + e)e )) T/£ 

e/(l+e) 



E(Eexp(A(l + ^o)) T/ 
The classical Bernstein bound gives 



Eexp(A(l < exp ( X }^;% /3) ) 



and thus 



XP S ( EeXP ( 2 £ (l-A(l + £ )Af/3) 

2/1 i r.\„2„ s , / \ 2 / 1 i ^2^2 



exp f Ai ( 1+g ^ Q 1 (Eexp / ^(l + £ )^(r-a) + Ny/(i+e) 
P 1 2(1 - A(l + e)M/3) / V P I 2e(l - A(l + e)M/3) // 



18a 2 1| (T-a)+ 



A * MOTTy ^lKT-a^lU.^ - 1 + V 1 + M 2 e )] (24) 



Thus if 



then the inequality in question holds. Note that ([23]) implies ([21]) ( one can al so directly see that (f23|) 
implies the assertion of the lemma). □ 

The next proposition gives a bound on the tail of sums of independent or one-dependent random 
variables with finite tp a norms. Its second part may seem to be formulated in a slightly artificial way, 
however when dealing with the Markov case this formulation will help us introduce the asymptotic 
variance of the additive functional to the inequalities. 

Proposition 3. (i) Let £o>-- - >£n-l be a one dependent sequence of mean zero, variance a 2 random 
variables, such that Eexp(c _a |£j| a ) < 2 and let m be a positive integer. Let M = c(3a~ 2 log(n/m)) 1//a ; 
then for any t > 0, 

t 2 



t 

P( sup | 6_i | > t) < 2e 8 e~^ 7T +4exp( 



k<n/JU 32(\n/(2rn)]a 2 +Mt/6y 

(ii) Assume additionally that the variables £j are independent and let N be a stopping time (wrt 
some filtration T{ 5 a(£o> • • • ? £i-l) such that £j is independent of T%) such that N < n. Let 
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M = c(3a 2 logn) 1 /°. Then for any e G (0,1), a > and p, q > such that p 1 + q 1 = 1, we 
have for all t > 0, 



N _j —2 2 

^&-i|>tj<ee +2 exp^ 2((1 + e)afT 2 + ^-i; 



w/iere 

. / 3 

u = mm , , , . 

V 4M(1 + e) (1 + e)ay/\\ (N - a)+ UJ 

Proof. First observe that the case of one-dependent random variables can be easily transformed to 
the question of independent ones. Indeed it suffices to split the sum into odd and even part, namely 
we use 

k oo oo 

P( sup >t) <P( sup I j^e«lai<fc|>t/2) + P( sup \J2&i+i 1 2i+i<k\>t/2). (25) 

k<n/m k<n/m k<n/m • q 

Then decompose with respect to M, i.e. let £j = ^ih^\>M + £il|&|<M an d denote 

*i = 6il|&|>Af ~ E^1|^.| >M , Zj = Cil|fi|<M _ E£il|£.|<jvf- 
It results in the following inequality 

oo oo oo 

P( sup \^2&ihi<k\ > t/2) < P( sup I^^WI > t/4) + P( sup \J2 Z ^2i<k\ > t/4). (26) 

k<n/m i= Q k<n/m k<n/m 

We use the usual Laplace transform argument on each of the summands. We have by Markov's and 
triangle inequalities (recall that a £ (0,1), so |x + y| a < |x| a + |y| a ), 

oo oo 

P( sup |^y 2 a2 J < fc |>t/4)<e^/ 4Q Eexp(A a ^|y 2j ri2 i <(n-i)/ m )- 

k<n/m i=Q i=Q 

By Lemma [H for A < l/(2 1 / a c), 

oo 

Eexp(A Q \ Y 2i\ a hi<(n-i)/m) < exp(8), 

which gives 



e" — Eexp(A Q | ^Y 2i l 2i<n _ 1)/fn \ a ) < exp(8)e W . 

8=0 

This shows that the unbounded part of the sum is well concentrated. 
Let us now pass to the bounded part. Similarly as before, we have 

oo oo 

P( SUP I Y<Z2ihi<k\ > t/4) < e-^ /4 Eexp(/i| ^Z 2 il 2 i<(n-l)/ml), 
k<n/m i=Q i=Q 
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where we have used Doob's maximal inequality together with the fact that M& = exp(^i| ^2i^2i<k\ 
is a submartingale. 

Since \Zq\ < 2M, we can now use the classical Bernstein bound to get for < fi < 3/(2M), 

fi 2 a 2 \n/2m\ 

Eexp(^| 2^ Z 2ihi<(n-l)/m\) < 2ex P( 2 (x _ 2M^/3) 
from which it follows that 



t 2 



P( f sup I Y: Z 2i l 2i<k \ > t/4) < 2«p(-^ 7 + , 



k<n/m i=Q 

Combining this inequality with the previous estimate on the unbounded part gives 

t 2 

32(\n/(2m)]a 2 + Mt/6) J 



P( sup \ Y,^ihi<k\>t/2)<e s e w+2exp( 



k<n/m i=0 

Using an analogous argument for P(sup fc<n / m | Yl^o ^2i+i^2i+i<k\ > t/2) and then (f25j) we finally 
derive the inequality asserted in part (i) of the proposition. 

Let us now consider part (ii). We will use the above notation with obvious modifications, which we 
will not state explicitly (the difference stems from the fact that now there is no need to split the 
sum into the 'odd' and 'even' parts). Instead of the symmetric split we apply any p, q > 1 such that 
p~ l + q~ l = 1 and then 

N N N 

p(i E&-ii > *) ^ p (i E y -ii > p" 1 *) +p d E^-ii > ( 27 ) 

i=l i=l i=l 

The unbounded part can be handled in the same way as for part (i). As for the bounded part, from 
Lemma El for any a > 0, e G (0, 1) and 

(i < [i\ = mm f 



4M(1 + £)'(! + e)a v /\\(N-a) + \\^ 1 J ' 



we get 



Thus 



N 



Eexp(|^ E < exp 



e-^Eexp( M |E^-il) < e-^ 1 ^2 1 + £ /( 1+£ )exp^ (1 + eV2cj2 



2(1 - w^ 1 ) 



i=l 

from which we deduce like in Bernstein's inequality (setting [i = q~ 1 t/((l + e)aa 2 + f^i 1 q 1 t)) 



N ,.-1,2 

P(| E^»-il > < 2 1+e/(1+e) exp 



q-'V 

2({1 + e)aa 2 + n^Hq- 1 ) 



i=i 

which ends the proof of part (ii). □ 
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Remark We note that modifying the martingale argument in part (ii) of Lemma Q] one can show a 
similar inequality under the assumptions of part (i). 

Let us now pass to corresponding results in the case of suprema of empirical processes. We will 
consider a countable class T of functions / : X — > R (countability is important only for measurability 
purposes and clearly can be relaxed). Let F{x) = sup^gjr be the envelope of T . Our goal will 

be to obtain exponential bounds for P(S*p > (1 + e)~ESjr + t), where 

n n 

Sf=]|V/(&-i) := sup I V /(&_!) 



and 



<=1 • & ' i=l 



k 



S> = max sup ^ 



8=1 



(£i are independent random variables). 

Our main tool will be the following version of Talagrand's inequality, obtained by Klein and Rio 
[11] in the case of bounded summands. 

Theorem 4. Let £o, ■ • • ,£ n -i be i.i.d. random variables and assume that J- is a countable class of 
functions with an envelope F, such that E/(£o) = for all f £ T and \F(£i)\ < M a.s. Then for any 
X < 2/(3M), 

where a 2 = supy g j-E/(£o) 2 - 

Similarly as in [9] (formula (3.2)) by using the fact that exp(supj | Yn=i f(€i—l)\) is a submartin- 
gale, Doob's maximal inequality and Bernstein's approach we obtain the following 



Corollary 2. In the setting of Theorem^ for any t>0, 

P(S> > ESj- + t)< exp 



t 2 



2a 2 n + (4ES> + 3t)M 
and in consequence for any e > 0, 



P(g> > (1 + e)BS T + 1) < exp - — - . +exp 



t 2 



2(1 + e)na 2 J ' * V MD E 
w/iere D 6 = (1 + e -1 )(3 + Ae' 1 ). 

Two important aspects of the above inequalities are that the subgaussian behavior of the tail 
estimate for small t is governed by the weak-variance a 2 and that the parameter e can be taken 
arbitrarily small, which shows in particular that for Donsker classes J- the concentration properties 
of the empirical process are almost as good as for the limiting Gaussian process. Our next goal is to 
give tools, which will allow to prove similar inequalities for suprema of additive functionals of Markov 
chains. 

Combining Corollary [2] with Lemma Q] we obtain 
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Proposition 4. Consider i.i.d. random variables £o> • • • j£n-i and assume that J 7 is a countable class 
of functions with an envelope F, such that E/(£o) = for all f € T and Eexp(c _a -F(£j) a ) < 2. Let 
M = c(3a~ 2 logn) 1 / a and a 2 = sup /eF E/(£ ) 2 . Then for any e E (0,1/2) and t > 0, 

(1 - left 2 \ ( t(l - 2e) 



P(5> > (1 + e)E^ + t) < exp ( - { 2{l+ e £ l n(j2 ) + eexp ( 

+ e 8 exp 



2(1 + e)na 2 ) *\ 2M(1 + e^ 1 )(3 + 4 £ - r 

(^) c 



2c° 

Proo/. For / E J 7 , define the functions = f(x)l\F(x)\<M ~ E/(£ )l|.F(£o)|<M> M^) = f(x) - 

h(x) = f(x)l\ F{x)>M - Ef(Z )l lFm>M . Let 7i = {/<:/ E 7"}, i = 1,2. Clearly S> < + 5> 2 , 
and thus 

P(^> > (1 + e)E5^ + 1) < P(5^ > (1 + e)BS T + (1 - e)i) + P(5> 2 > et). 

We have 

n 

S h - ^2 ( F te-l) 1 |-F(|i-i)[>M + E -F(&-i)V(&_i)|>m) 
i=l 

and so by Lemma [T] and Chebyshev's inequality we get 

P(S> 2 > et) < e 8 e-^. (28) 

Let us note that without loss of generality we can assume that t > \6Me~ 1 (otherwise the right 
hand side of the inequality in question exceeds one). Therefore 

(1 + e)BS T > (1 + e)ES Tl - 2BS Ti > (1 + e)BS Tl - 4nEF(^)l F ^ o)>M 

> (1 + e)ES'jr 1 - l6Mnexp(-M a /c a ) > (1 + ejES^ - 16M 

> (1 + e)ES Tl - et, 

where the third inequality follows from Lemma [2 

Taking into account that for every / E J 7 , ||/i||oo < 2M and E/i(£o) 2 < E/(^o) 2 , we get by 
Corollary [2j 

P(S^ > (1 + e)ES T + (1 - e)t) < P(S* Tl > (1 + e)ES Tl + (1 - 2e)t) 

/ (l-2e) 2 t 2 \ / t(l-2e)\ 

" eXP ( " 2(l+ij^0 + 6XP \ ~ ^MDT ) ' 

which ends the proof of the proposition. □ 

5 Exponential concentration for additive functionals of Markov chains 

In this section we will prove tail estimates for Markov chains expressed in terms of quantities introduced 
in Section [2l 



21 



5.1 Additive functionals 

We will start from the most general of our inequalities and later we will add assumptions under which 
we are able to improve some aspects of the estimates. 

The inequalities we present are expressed in terms of the parameters a, b, c introduced in Section [21 
formula (HJ). Our discussion in Section [2] shows that there exist bounds on a, b, c in terms of A, B, C, T> 
(Proposition [1]) , which in turn can be estimated via drift conditions (Theorems E]) . We do not plug 
those inequalities into our tail estimates so as not to obscure the already quite involved formulas. 

Theorem 5. Let {X} be an ergodic Markov chain and {X m } its m-skeleton used in the split chain 
construction. Assume for simplicity that m\n and set M = c(3a~ 2 log(n/m)) 1 / a . Then the following 
inequality holds for all t > 

P«(l E / Wl > 3t) <2exp ( - + 2 7 r»- 1 exp ( - + 2e 8 exp ( - 

+ 4eXP ( ~ 32([n/(2m)V + Mt/6)) ' 
where a, b,c have been defined by formula and a 2 = Esq. 

Proof. We have introduced all necessary tools to prove the exponential concentration. By the con- 
struction of the split chain, 

n— 1 ri—1 

P*(| £ f{Xi)\ > St) = P x .(\ f(Xi)\ > 3t). 

i=0 i=l 

In ((2J) we have decomposed Y17=o /(-^*) three summands 

71-1 



£ /Ml < On(/) + K(/) + W n (/), 



k=0 

therefore to complete our proof it suffices to bound 

p*(|cg > t), P*(\v n \ > t), p*(|wg > t). 

For simplicity we have assumed that m\n, then all the quantities can be expressed in terms of Zk(f), 
k > 0, namely 

min(<r,n/m— 1) JV 

U„.(f) = \ Y Z hU)\ K(/)=^^i, = |l*>o Y Z ^ 

fc=0 1=1 k=n/m 



To bound the first term note that 



P»'(|CUf)|>t)<2exp(~), (29) 
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where a = || Y2k=o Zk(f)\\ib a p „ • Estimating W n (f) is more involved. By Pitman's occupation mea- 
sure formula ([El [20], see also [TBI Theorem 10.0.1]) we get 

OO T a f(0) 

Y,^M^~ a \Y,Z k {f)\ a )l Ta > l = Tr*{a)- 1 ^ exp(b-«|^Z fc (/)r), 

1=1 k=l k=0 

and it follows that 

n/m-l Ta 

PA\W n (f)\>t)< M*»/m-je«)P«(lZ>(/)l>*&^>0< 

1=1 k=l 
n/m-l Ta 

< Y, P^(^ /m _,ea)e-&E a (l Ttt > z exp(b- a |£z fe (/)r)< 

1=1 k=l 

< [ max PAK/m-l e OL)]e-^{a)- x ^ exp(b- Q (^ \Z k (f)\) a ). 

KKn/m 1 

' k=0 



Obviously 



[max P^(X™ /m _ /G a)]<l, 

l<(<n/m ' 



but we stress here that lim n ^, 00 P x *(X^ m _ l G a) = vr*(a), so the bound usually can be much better 
if we wait for ergodicity to be observed. However in the general case it shows that 

PAWn(f) >t)< e-^ln^a)- 1 ^ exp(b- a (^ |Z fc (/)|) a )], 

k=0 

so due to the definition of b it yields 

P x *(W n (f)>t)<2Tr*(a)- 1 e-£. (30) 

Finally we should give a bound on P^;* (Vn ^ ^)? y^f this is exactly the setting of our main tool. If 
we set £j = Si and 

c = \\si(f)\\f a ,p = c, 

then Proposition [3] gives 

Px*(\V n \ >t)< 2e 8 e"^ TF +4exp( — — ^— — ), 

FV 32([n/(2m)]c7 2 +Mt/6) ; 

where a 2 = Esf . □ 

One of important features of the classical Bernstein inequality is that for small t (i.e. t <C n) it 
provides a subgaussian tail estimate with the subgaussian coefficients given by the variance of the 
i.i.d. summands. Thus it shows that for some range of t the tail estimate is almost as good as for the 
limiting Gaussian variable. 
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Note that for m > 1 the subgaussian coefficients from Theorem differs from the asymptotic 
variance (i.e. the variance of the limiting Gaussian distribution), which as is well known (see eg. 
[6j [16]) is equal to E(so(/) 2 + 2Eso(/)si(/)). Moreover also in the case m = 1, the subgaussian 
coefficients remains bounded away from the asymptotic variance (as the constant in front of na 2 is 
not equal to 2). 

We would now like to provide an estimate for additive functionals with the subgaussian coefficients 
arbitrarily close to 2a 2 . To achieve this we will have to make additional assumptions concerning the 
Markov chain. Namely, we will assume that it is geometrically ergodic, i.e. that ||<r(0) + p * = 
||<t(1) — f(0) Hi^! < oo (recall that v is the minorizing measure used in the splitting construction) and 
that m = 1 (i.e. the chain is strongly aperiodic). 

Our proof will rely on the second part of Proposition [3l In order to use it we will first show that 
the stopping time N does not deviate much from nir*(a). 

Lemma 4 (Bernstein's ipi inequality). If £o> • • • >£n-l are independent mean zero random variables 
such that H^ill^i < c, then for all t > 0, 

i=0 

Before we formulate the next lemma, let us introduce one more parameter, which quantifies geo- 
metric ergodicity of the chain. Namely, define 



d = ||a(l) - o-(0)\\^ = 11(7(0) + 1||^,p„. < 00 ( 31 ) 

(note that this definition does not depend on the initial distribution of the chain). 

As is well known geometric ergodicity is equivalent to finiteness of d, which can be effectively 
bounded by using classical drift conditions (see e.g. |16j). Let us remark in passing that those drift 
conditions can be obtained from condition (|16p in the special case of / = 1. 

Recall the stopping time N defined in ([3]) and note that by the law of large numbers, for n 3> 1 it 
should behave like TT*{a)n. The next lemma quantifies this intuition and gives a bound on deviations 
of N. 

Lemma 5. Assume that m = 1 and the Markov chain is geometrically ergodic. Then for any £ £ (0, 1) 
and every integer k > 7r*(a)n(l + e), 



P x * (N > k) < exp 

In consequence, 



k — -ir*(a)n 
36vr*(a) 2 d 2 e- 



2j2 -1 



||(Ar_ ( i + e) ^ (a)n)+ ||^^ < l44 7 r*(a)^d / £ 

Proof. Set Tj = a{i) — a{i — 1) for % > 1 and note that Tj are i.i.d. random variables with common 
mean ETj = 7r*(o!) _1 < d. We have \\Ti — ET^Lj < 2d. Thus for every integer k = 7r*(a)n(l + e + t), 
t > 0, 
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IV (N > k) = P x * (a(k) < n - 1) < P x * (^Ti < n - 1) 

i=l 

fc 

< P,*(^(T 4 - vr^a)" 1 ) < n - 1 - ^(a)" 1 *;) 
i=l 

= P^(^(T, - vr^a)" 1 ) < -n(e + t) - 1) 
i=l 

n 2 {e + t) 2 

~ 6XP( " 16d 2 vr*(a)n(l + e + t) + 4(e + t)n&>' 

which taking into account that 7r*(a)d > 1 and (e + t + l)/(e + t) < 2/s, yields 

- . r / iT*(a)n(e + t) \ ( k — ir*(a)n 

PAN > k) < exp - — Li-L-^ ) = exp ' 



367r*(a) 2 d 2 e- 1 / y \ 36 7 r*(a) 2 d 2 £- 1 
Thus for Z = (N - (1 + e)vr*(a)n) + and a = 367r*(a) 2 d 2 £~ 1 , we have 

Eexp — 1=1+ / e t P !E *(Z > 2at)dt = 1 + / e t dt + exp(i - (2ai - l)/a)dt 

V2o/ JO Vo Jl/(2a) 

= e V(2a) + e l/o e -l/(2o) = 2e V(2a) < 4j 

which proves the Lemma. □ 

Repeating the proof of Theorem and using the second part of Proposition [3] instead of the first 
one, we get the following theorem. Note that by playing with the parameters p, e we can now make the 
subgaussian coefficient arbitrarily close to the optimal one (given by the CLT) at the cost of worsening 
the remaining constants. 

Theorem 6. Let {X} be a strongly aperiodic ergodic Markov chain and {X 1 } its split version. Let 
p, q > 1 satisfy p^ 1 + q^ 1 = 1 and e G (0, 1). Then the following inequality holds for all t > 0, 

P X (|^/(X,)| >t)< 2exp(-^|^) + 2+»- 1 exp( (// '"' )n ■ 



(2a)" ' * ' ^ (2b)* 



e 8 eX p(- ^ } L > ) + 2 1+£ /( 1+£ ) exp(- 



where 
and 



2c a ^ K 2({l +e)a 2 n + M{e)tq- 2 ) 

a 2 = a 2 (f) = 7T*(a)Bs (f) 2 



/4c(3a- 2 logn) 1 / a (l + e) 12vr*(a)d(l + e)a 
M {s) — max ^ , 
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5.2 Empirical processes 



In this section we present estimates for empirical processes of geometrically ergodic Markov chains. 
The interest in this type of random variables stems from their wide applicability in statistics and 
machine learning theory, e.g. model selection, M-estimation or other statistical methods based on 
minimization of some empirical criteria. 

We want to obtain concentration inequalities for chains started from arbitrary initial conditions, 
so in addition to the parameter d we will introduce its counterpart, measuring how quickly the chain 
regenerates for a given starting point 



e - iFWII^i.P^ 

We will need this parameter since in the empirical process case we are not able to use the martingale 
argument of the second part of Proposition and so to obtain the inequalities with the subgaussian 
term close to the optimal one, we will have to consider separately the case of large and small N. 
Our main result concerning empirical processes of Markov chains is the following 

Theorem 7. Let {X} be a strongly aperiodic geometrically ergodic Markov chain and {X 1 } its split 
version. Let J- be a countable class of functions f : X — > R with an envelope F and assume that for 
some a £ (0, 1] the parameters a, b, c for the function F are finite. 
Denote 

n-l 

Z = sup|V/(Xi)|. 



Let e G (0, 1/2) and denote M = c(3a~ 2 login)) 1 / and 

a 2 = supE So (/) 2 . 

Then 

P X JZ>{1 + 7e)BZ + t )<exp(- SLl^l\ + e exp ( ^ " 2e )' 



2(1 + e)W/ 2M(l+e- 1 )(3 + 4e- 1 ; 

(e(l-2 £ )t) a \ / e 2 n 



+ e 8 exp I — — I + e exp 



2c Q J F V I44vr*(a)d 2 

+ 2 exp ( - ^) + 2vr*(a)" 1 exp ( ^ 



(2a)«; 1 v 7 * \ (2b) 
for all t > C{e), where 

C(e) = e- l ((9 + 9e + 277r*(a)d 2 e- 1 )vr(F) + 9T(1 + l/a)a + 9 • 2 1 / Q - 1 e r(l + 1/a) ]og^ a (e/ir*(a))b\ . 



Remark Let us now discuss certain aspects of the above rather technical (and not very friendly 
looking) theorem, which may help understand its applicability and limitations. 

First, let us point out that the involved form of the estimate is a result of the effort to obtain 
a probability bound which for small t would exhibit the subgaussian behavior with the subgaussian 
parameter arbitrarily close to the weak variance (which for Donsker classes of functions is responsible 
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for the concentration of the limiting Gaussian distribution). Clearly by choosing e small enough one 
can approach the right subgaussian coefficient arbitrarily close at the cost of worsening the constants 
in the other terms and decreasing the interval for which the estimate behaves like in the Gaussian case 
(the length of this interval is of the order n 1 ^ 2- ") for a < 1 and n/logn for a = 1, which is greater 
then the CLT scaling \/n). 

Second, let us note that the fourth term of the estimate does not involve t, but depends only on 
n. This is a consequence of our method and we do not know if one could remove this term completely 
without worsening the dependence of the parameters a, . . . , d in the other terms. It is relatively 
easy to replace this term e.g. by exp(— e 2 t / (Ctt* (a)d 2 c)) , where C is a universal constant, however, 
since standard applications of this type of tail estimates involve t of order at most n, for which the 
'troublesome' term is dominated by terms involving t, we do not pursue this direction here (see [2] 
where inequalities of this type are proven in a slightly different context). 

Finally, we would like to comment on the threshold C(e) appearing in the estimates. It is of 
order e -2 and is independent of n, so it does not pose a problem in typical applications (when t is of 
order n 7 ). Moreover, even without this threshold the estimate would not yield any information for t 
of order smaller then e -2 , since the denominator of the exponent in the second term is of the order 
e~ 2 . At present we do not know whether the dependence of the threshold and the estimates on e and 
parameters d,e and n* (a) can be improved (note that d, e are not related just to the function F but 
are 'global' parameters of the chain). 

Let us now pass to the proof of Theorem [71 The next lemma is a strengthening of an analogous 
estimate from [TJ (in particular it incorporates the dependence on e to the inequality). Its estimate 
will be used to compare the expectation of the random sum introduced with the regeneration method 
and the random variable Z . 

Lemma 6. Let {X} be a strongly aperiodic Harris ergodic Markov chain and {X 1 } its split version. 
Let J- be a countable class of functions f : X — > M with an envelope F. Then for every e E (0, 1/2), 



E, 



L(l+e)7r*(a)nJ 



A' 



2=1 



E s *~i(/) ^<(l + 4e)(E^||E^-i(/) 



i=i 



+ (2 + 26^(0) + ,^sg£fefllz£ffig HF) 



Proof. Let m = [(1 - £)vr*(a)nj , n 2 = [(1 + e)it*{a)n\ and denote B = <r(£Si Si-i(/) : f € F). By 
Jensen's inequality and exchangeability of random vectors (sj(/)) /ej-, % = 0, . . . , n — 1, if n\ + 1 < 
we have 



i=l 



ni+1 



2=1 

ni+1 



T 



2=1 



ni + 1 
n 2 



eJ|X>-i(/) 



2 = 1 



T 
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Combining the above inequality with the trivial case n% = we get 

712 , . .. 1 



i=i 



< max 

T~ K n 1 + 



-A)(B x «s (F) + B x *\\^s^ 1 (f) 



i=l 



T 



Moreover 



i=l 



N 



E„. ^Si_l(/) <Ea.. J^Si_i(/) l{JV> ni } +E S . ^Si_i(/) 



1=1 



i=l 



1 



.F 



{Ar<m} 



"1 



+ E ffi * Si _i(/) 



1=^+1 



1 



{AT<m}- 



By Doob's optional sampling theorem, 



N 



E x * I ^ s;_i(/) l{Ar> ni } < E x . || ^ 8i-i(/) 



8=1 



i=l 



1 



T 



{N> ni }- 



(32) 



(33) 



(34) 



Moreover, by independence of (st(/))/ej",i>jv an d -/V, the third summand on the right hand side of 
(1331) does not exceed 



E*.(ni - A^) + E,, So (i ? ) = vr*(a)- 1 7r(F)E x ..(n 1 - N)+ 

To bound E x *(ni — N) + we can proceed as in the proof of Lemma [SI however this time we do not 
need exponential inequalities. Set again Tj = a(i) — a(i — 1) for i > 1 and additionally To = c(0). 
For i € (0,ni/(7r*(a)n)) we have 



P z *((ni - N) + > ir*(a)nt) = P X *(N < m - ir*{a)nt) 

[ni—TT* (a)ntj 

<P x *{a{[n 1 -7r*(a)nt\)>n-l)=P x ^ T f > n - 1 

i=0 

\n\— n*(a)nt\ 

= Px* (T > tn/2 - 1) + P x » ( £ T > (1 - i/2) 

i=i 

[ni — 7r* (ct)ntj 

< P x * (2(T + 1) > in) + P x » ( ^ (T - ET) > (1 - i/2)n - (1 - e - t) 

i=l 

- . , , , niET 2 - . . , . 7r*(a)ET, 2 

< P x » (2(T + 1) > tn) + 1 < P x » (2(T + 1) > tn) + 1 



n 



n 



n 2 (e + t/2) 2 

where we used the fact that for i > 0, ET = vr*(a) _1 



n(e + t/2) 2 ' 



Now 



E-b^th - N)+ = vr*(o> / P((m - N)+ > ir*(a)nt)dt 



o 

<.>)„^>(2(T + l)> t „) + ; J^W 
< 27r*(a)(E x *T + 1) + 2£~V(a) 2 ET 1 2 . 
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Thus 



i=N+l 



I *i-i(/) ^l{^Kn 1 }<2(l + E c *To + 



7^*(a)ET 1 2 



tt(F). 



Combining the above estimate with ([32]) . ([33]) . (f3l|) and the inequality E x *so(i ? ) = vt*(q) 1 7r(F) < 
vr*(a)ET 1 2 vr(F), we get 



n 2 AT 

E^| ^ maxC^ 2 -^,!)^! ^ Si _!(/) 



7r*(a)E2? 



n i ■ 

To finish the proof it suffices to note that for e 6 (0, 1/2), 

n 2 < (1 + s)nir*(a) < 1 ^ 



+ 2 + 2E x »To + 3 
J" V e 



vr(F) 



n\ + 1 (1 — e)n7r*(a) 
Corollary 3. In f/ie setting of Theorem^ for every e £ (0, 1/2), 

L(l+e)7r*(a)nJ AT 

E x *|| s i-l(f)\\ <(l + 4e)(E x ,||^ Sl _ 1 (/) 

i=i T i=i 

+ (2 + 2e + 67r*(a)d 2 e -1 )tt(F) 

Proof. We use Lemma [6] together with the inequality 

EX EX 2 

1 + — + „„„„o <2, 



□ 



T 



X 



21X1 



which holds for every random variable with ||X|Lj 7^ 0. □ 
Lemma 7. In i/ie setting of Theorem^ 

E x *U n (F) < 2r(l + l/o)a, 

and 

E x .W n (F) < 2 1 / Q e r(l + l/Q)log 1 / Q (e/^(Q))b, 
where T(z) = J °° t z ~ l ex.p(—t)dt is the Euler function. 

Proof. The first estimate follows from integration by parts, the estimates P^* (U n (F) > t) < 2 exp(— (t/a) a ) 
and the formula for the expectation of Weibull variables. The second one is analogous, one simply 
needs to note that the inequality P x *(W n (F) > t) < 27r*(a)~ 1 exp(— (t/h) a ) (obtained as in the proof 
of Theorem [5]) implies that 



P x *{W n {F) >t) <exp 1 



21og(e7T*(a)- 1 )b° 



□ 
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Proof of Theorem [7| As in the case of a single additive functional, we decompose 



71-1 



Z:=sup| V/PQ)| < \U n (F)\ + sup \V n (f)\ + \W n (F)\. 



Similarly as in the proof of Theorem [5] we get 

P x *(\U n (F)\ > et/2) < 2exp 



(etr 



(2a)' 



and 



P x *(|W n (F)| > et/2) < 2^(a)- 1 exp 



(2b)' 



Let us also note that by Lemma [7] 



n-l 



E B . sup | V f(Xi)\ > E x » sup |K(/)| - 2r(l + l/a)a - 2^(1 + 1/a) log 1 / a (e/7r*(a))b. 



By Corollary [3] we get 



(1 + 4 £ )E^ sup|K(/)| >E X . 



(l+e)7r*(a)n 

Si _l(/) - (0 + (Se-f l.8,r(r,)d-:-- J )k(F). 



J" 



which together with the previous estimate gives 



(1 + 4 £ )E E *Z >E X * Yl 



(l+e)7r* (a)n 



i=l 



s<_i(/) - ( 6 + 6e + 187r*(a)d e )tt(F) 



- 6r(l + l/a)a - 3 • 2 1 / Q e r(l + 1/a) log 1/a (e/vr*(a))b. 

When combined with the trivial bound (1 + 4e)(l + e) < (1 + 7e) for e 6 (0, 1/2), this gives 

JV 

A := [Z > (1 + Te)^* I ai-i(/)||^ + (1 - e)t} 

{l+e)n* (a)n 

C (sup|y n (/)| > (1 + eJE^ V I - eC{e) + (1 - e)t\ 



i=l 

(l+e)7r*(a)n 



C |sup|K(/)| > (l + e)E x . V Si_i(/) + i(l-2e)| 



for t > C(e). 
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By Lemma (applied with e/2 instead of e) we get 

P X *(N > (1 +e)vr*(a)n) < eexp(-e 2 n/(1447r*(a)d 2 )) 

, which gives 

P X *(A) <P x *(AkN < (l + e)7r*(a)n) + eexp(-e 2 n/(144vr*(a)d 2 )). (37) 
Now Proposition S] gives 
P X *{A &N < (l + e)TT*(a)n) 

k L(l+e)7r* (a)n\ 

<P X *( max : sup|J>_i(/)|>(l + e)E £ Sf _i(/) + t(l - 2e)) 

k [(l+e)TT* (a)n\ 

<P X .( , , max sup|^ Si _a(/)|>(l + £ )E £ s^f) + t(l - 2s)) 



< exp f _ ii^r^i^ + eexp ( *(i - 2 ^) 2 ^ + e s exp r «i - 



2c Q 

which in combination with (j35[) . ()36|) and (j37|) ends the proof. □ 
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